Serial No.: 10/685,656 
Art Unit: 2442 
Page 2 

AMENDMENTS 

In the Claims 

The following is a marked-up version of the claims with the language that is underlined 
(" ") being added and the language that contains strikethrough (" — ") being deleted: 

1 . (Currently Amended) A method comprising: 
{A) receiving aft a first email message from a simple mail transfer protocol (SMTP) server, 
the first email message comprising displaying characters and non-displaying characters, the 
non-displaying characters including non-displaying comments and non-displaying control 
characters; the first email message further comprising: 

(A4) a 32-bit string indicative of the a length of the first email message; 

(A2) a text body; 

(A3) an SMTP email address; address that includes a user name and a domain name; 

(A4) an attachment; 
(B) searching for the non-displaying characters in the oma il ; first email message: 
(G) removing the soarchod non-displaying characters, including the non-displaying 
comments and the non-displaying control characters; 

(O) determining non-alphabetic displaying characters in the oma il , first email message, 
where determining the non-alphabetic displaying characters includes a per-character analysis 
that recursively determines for each character whether: 

(04) a character is a non-alphabetic character; 

(02) if the character is a non-alphabetic character, whether the character is a space; 

(05) if the character is a space, determine whether the space is adjacent to a solitary 

T or ££ a"; and 

(04) if the non-alphabetic character is not a space, filtering the determined non- 
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alphabetic displaying characters from the oma il ; first email message: 

(E) generating a phonetic equivalent for each word that includes only alphabetic displaying 
characters that has a phonetic equivalent; 

(F) tokenizing the phonetic equivalents in the a displaying portion of the text body to 
generate a plurality of body tokens representative of words in the texti- text body: 

(G) tokenizing the SMTP email address to generate a an address token representative of the 
SMTP email address; 

(H) tokenizing the domain name to generate a domain token that is representative domain 
name; 

(I) tokenizing the attachment to generate a an attachment token that is representative of the 
attachment, wherein tokenizing comprises: 

generating a 128-bit MD5 hash of the attachment; 

appending the 32-bit string to the generated MD5 hash to produce a 160-bit 
number; and 

UUencoding the 1 60-bit number to generate the attachment token representative 
of the attachment; 

(J) determining a corresponding spam probability value for each of the gonoratod tokens; 
plurality of body tokens, the address token, the domain token, and the attachment token: 

determining whether at least one of the plurality of body tokens, the address token, the 
domain token, and the attachment token is present in a database of tokens and, in response to 
a determination that at least one of the plurality of body tokens, the address token, the domain 
token, and the attachment token is present in the database of tokens: 

updating the spam probability value of the plurality of body tokens, the address token, 

the domain token, and the attachment token: and 

(K) sorting the gonoratod tokens plurality of body tokens, the address token, the 
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domain token, and the attachment token in accordance with the corresponding 
dotorm i nod spam probability value to determine a predefined number of interesting 
tokens, the predefined number of interesting tokens being a subset of the gonoratod 
tokens: plurality of body tokens, the address token, the domain token, and the 
attachment token: 

(t) classifying the gonoratod tokens plurality of body tokens, the address token, the domain 

token, and the attachment token as spam, non-spam, or neutral; 
(M) selecting the predefined number of interesting tokens, to create selected interesting 

tokens, the selected interesting tokens being the gonoratod tokens plurality of body 

tokens, the address token, the domain token, and the attachment token having the a 

greatest non-neutral probability values; 
(N) performing a Bayesian analysis on the selected interesting tokens to generate a spam 

probability; af>4 

(©) categorizing the first email message as a function of the gonoratod spam probab ili ty- 
probability: and 

filtering a second email message. 



2.-5. (Canceled) 



6. (Currently Amended) A method comprising: 

receiving, at a computing device, aft a first email message comprising a text body, an 
SMTP email address, an attachment, and a domain name corresponding to the SMTP email 
address, the text body including displaying characters and non-displaying characters; 
searching for the non-displaying characters in the oma il ; first email message: 
removing the searched non-displaying characters, including non-displaying comments 
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and non-displaying control characters; 

tokenizing the SMTP email address to generate a an address token representative of the 
displaying characters of the SMTP email address; 

tokenizing the attachment to generate a an attachment token that is representative of the 
attachment; 

tokenizing the domain name to generate a domain token representative of the domain 

name; 

determining a corresponding spam probability value from the gonoratod tokens; and 
address token, the attachment token, and the domain token: 

determining whether at least one of the address token, the attachment token, and the 
domain token is present in a database of tokens and, in response to a determination that at 
least one of the address token, the attachment token, and the domain token is present in the 
database of tokens: 

updating the spam probability value of at least one of the address token, the 
attachment token, and the domain token: 

sorting the gonoratod tokens address token, the attachment token, and the 
domain token in accordance with the corresponding dotorm i nod spam probability value to 
determine a predefined number of interesting tokens, the predefined number of interesting 
tokens being a subset of the gonoratod tokens, address token, the attachment token, and the 
domain token: and 

filtering a second email message. 

7.-10. (Canceled) 



1 1 . (Currently Amended) The method of claim 6, wherein determining the spam 
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probability comprises: 

assigning a an address spam probability value to the address token representative of the 
SMTP email address; 

assigning a domain spam probability value to the domain token representative of the 
domain name; and 

generating a Bayesian probability value using the spam probab ili ty va l ues address spam 
probability and the domain spam probability assigned to the tokens, address token and the 
domain token. 

12. (Currently Amended) The method of claim 11, wherein determining the spam 
probability further comprises: 

comparing the generated Bayesian probability value with a predefined threshold value. 

13. (Currently Amended) The method of claim 12, wherein determining the spam 
probability further comprises: 

categorizing the first email message as spam in response to the Bayesian probability 
value being greater than the predefined threshold. 

14. (Currently Amended) The method of claim 12, wherein determining the spam 
probability further comprises: 

categorizing the first email message as non-spam in response to the Bayesian 
probability value being not greater than the predefined threshold. 



15. (Canceled) 
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16. (Currently Amended) The method claim 6, wherein receiving the first email 
message further comprises: 

receiving an the first email message including a text body. 

17. (Currently Amended) The method of claim 16, further comprising: 
tokenizing the words in the text body to generate body tokens representative of the 

words in the text body. 

18. (Canceled) 

19. (Currently Amended) The method of claim 17, wherein determining the spam 
probability comprises: 

assigning a body spam probability value to each of the body tokens representative of the 
words in the text body; 

assigning a an attachment spam probability value to the attachment token representative 
of the attachment; and 

generating a Bayesian probability value using the spam probab ili ty va l ues body spam 
probability value and the attachment spam probability value assigned to the tokens, body tokens 
and the attachment token. 

20. (Currently Amended) The method of claim 19, wherein determining the spam 
probability further comprises: 

comparing the gonoratod Bayesian probability value with a predefined threshold value. 

21 . (Currently Amended) The method of claim 20, wherein determining the spam 
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probability further comprises: 

categorizing the first email message as spam in response to the Bayesian probability 
value being greater than the predefined threshold. 

22. (Currently Amended) The method of claim 20, wherein determining the spam 
probability further comprises: 

categorizing the first email message as non-spam in response to the Bayesian 
probability value being not greater than the predefined threshold. 

23. (Currently Amended) A system comprising: 

a memory component that stores at least the following: 

email receive logic configured to receive aft a first email message comprising an 
SMTP email address, a domain name corresponding to the SMTP email address, and an 
attachment, the first email message further including displaying characters and non-displaying 
characters; 

searching logic configured to search for the non-displaying characters in the 
oma il : first email message: 

removing logic configured to remove the soarchod non-displaying characters, 
including non-displaying comments and the non-displaying control characters; 

tokenize logic configured to tokenize the SMTP email address to generate a an 
address token representative of the SMTP email address; 

tokenize logic configured to tokenize the attachment to generate a an attachment 
token that is representative of the attachment; 

tokenize logic configured to tokenize the domain name to generate a domain 
token representative of the domain name; 
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analysis logic configured to determine a corresponding spam probability value 
from the gonoratod tokens; address token, the attachment token, and the domain token: and 

determine logic configured to determine whether at least one of the address 
token, the attachment token, and the domain token is present in a database of tokens and, in 
response to a determination that at least one of the address token, the attachment token, and 
the domain token is present in the database of tokens: 

update the corresponding spam probability value of the address token, 
the attachment token, and the domain token: 

sort i ng l og i c conf i gured to sort the gonoratod tokens address token, the 
attachment token, and the domain token in accordance with the corresponding dotorm i nod 
spam probability value to determine a predefined number of interesting tokens, the predefined 
number of interesting tokens being a subset of the gonoratod tokens, address token, the 
attachment token, and the domain token, wherein only displaying characters are tokon i zod. 
tokenized: and 

filter a second email message. 
24. (Currently Amended) A system comprising: 

means for receiving aft a first email message comprising an SMTP email address, a 
domain name corresponding to the SMTP email address, and an attachment, the first email 
message further including displaying characters and non-displaying characters; 

means for searching for the non-displaying characters in the ema il ; first email message: 

means for removing the searched non-displaying characters, including the non- 
displaying comments and the non-displaying control characters; 

means for tokenizing the SMTP email address to generate a an address token 
representative of the SMTP email address; 
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means for tokenizing the attachment to generate a an attachment token that is 
representative of the attachment; 

means for tokenizing the domain name to generate a domain token representative of the 
domain name; 

means for determining a corresponding spam probability value from the gonoratod 
tokens: and address token, the attachment token, and the domain token: 

means for determining whether at least one of the address token, the attachment token, 
and the domain token is present in a database of tokens: and 

means for, in response to a determination that at least one of the address token, the 
attachment token, and the domain token is present in the database of tokens: 

updating the spam probability value of the address token, the attachment token, 
and the domain token: 

moans for sorting the gonoratod tokens address token, the attachment token, 
and the domain token in accordance with the corresponding dotorm i nod spam probability value 
to determine a predefined number of interesting tokens, the predefined number of interesting 
tokens being a subset of the gonoratod tokens, address token, the attachment token, and the 
domain token, wherein only displaying characters are tokon i zod. tokenized: and 
filtering a second email message. 

25. (Currently Amended) A computer-readable storage medium that includes a 
program that, when executed by a computer, performs at least the following: 

receive an a first email message comprising an SMTP email address, a domain name 
corresponding to the SMTP email address, and an attachment, the first email message further 
including displaying characters and non-displaying characters; 

search for the non-displaying characters in the oma il ; first email message: 
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remove the searched non-displaying characters, including the non-displaying comments 
and the non-displaying control characters; 

tokenize the SMTP email address to generate a an address token representative of the 
SMTP email address; 

tokenize the attachment to generate a an attachment token that is representative of the 
attachment; 

tokenize the domain name to generate a domain token representative of the domain 

name; 

determine a corresponding spam probability value from the gonoratod tokens; address 
token, the attachment token, and the domain token: and 

determine whether at least one of the address token, the attachment token, and the 
domain token is present in a database of tokens and, in response to a determination that at 
least one of the address token, the attachment token, and the domain token is present in the 
database of tokens: 

update the corresponding spam probability value of the address token, the 

attachment token, and the domain token: 

sort the gonoratod tokens address token, the attachment token, and the domain 
token in accordance with the corresponding dotorm i nod spam probability value to determine a 
predefined number of interesting tokens, the predefined number of interesting tokens being a 
subset of the generated tokens, wherein only displaying characters are tokon i zod. tokenized: 
and 

filter a second email message. 
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26. (Currently Amended) The computer-readable storage medium of claim 25, the 
program further causing the computer to perform at least the following: 

assign a an address spam probability value to the address token representative of the 
SMTP email address; 

assign a domain spam probability value to the domain token representative of the 
domain name; and 

generate a Bayesian probability value using the spam probab ili ty va l ues address spam 
probability value and the domain spam probability value assigned to the tokens. 

27. (Currently Amended) The computer-readable storage medium of claim 26, the 
program further causing the computer to perform at least the following: 

compare the gonoratod Bayesian probability value with a predefined threshold value. 

28. (Currently Amended) The computer-readable storage medium of claim 27, the 
program further causing the computer to perform at least the following: 

categorize the first email message as spam in response to the Bayesian probability 
value being greater than the predefined threshold. 

29. (Currently Amended) The computer-readable storage medium of claim 27, the 
program further causing the computer to perform at least the following: 

categorize the first email message as non-spam in response to the Bayesian probability 
value being not greater than the predefined threshold. 
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30. (Currently Amended) A system comprising: 
a memory component that stores at least the following: 

email receive logic configured to receive aft a first email message comprising an 
attachment and an address, the email message further including displaying characters and non- 
displaying characters; 

search logic configured to search for the non-displaying characters in the oma il ; 
first email message: 

remove logic configured to remove the soarchod non-displaying characters, 
including the non-displaying comments and the non-displaying control characters; 

tokenize logic configured to generate a at least one attachment token 
representative of the attachment; 

analysis logic configured to determine a corresponding spam probability value 
from the gonoratod at least one attachment token; and 

database determining logic configured to determine whether the at least one 
attachment token is present in a database of tokens and, in response to a determination that the 
at least one attachment token is present in the database of tokens: 

update the corresponding spam probability value of the at least one 

attachment token: 

sort l og i c conf i gured to sort the gonoratod tokens at least one attachment 
token in accordance with the corresponding spam probability value to determine a predefined 
number of interesting tokens, the predefined number of interesting tokens being a subset of the 
gonoratod tokens, at least one attachment token, wherein only displaying characters are 
tokon i zod. tokenized: and 

filter a second email message. 
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31 . (Currently Amended) A system comprising: 

means for receiving aft a first email message comprising an attachment and an address, 
the first email message further including displaying characters and non-displaying characters; 

means for searching for the non-displaying characters in the oma il ; first email message: 

means for removing the soarchod non-displaying characters, including the non- 
displaying comments and the non-displaying control characters; 

means for generating a at least one attachment token representative of the attachment; 

means for determining a spam probability value from the gonoratod at least one 
attachment token; aa4 

means for determining whether the at least one attachment token is present in a 
database of tokens: and 

means for, in response to a determination that the at least one attachment token is 
present in the database of tokens: 

updating the spam probability value of the at least one attachment token: 
moans for sorting the gonoratod tokens at least one attachment token in 
accordance with the correspond i ng dotorm i nod spam probability value to determine a 
predefined number of interesting tokens, the predefined number of interesting tokens being a 
subset of the generated tokens, wherein only displaying characters are tokon i zod. tokenized: 
and 

filtering a second email message. 

32. (Currently Amended) A computer-readable storage medium that includes a 
program that, when executed by a computer, performs at least the following: 

receive an a first email message comprising an attachment and an address, the first 
email message further including displaying characters and non-displaying characters; 
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search for the non-displaying characters in the oma il ; first email message: 
remove the soarchod non-displaying characters, including the non-displaying comments 
and the non-displaying control characters; 

generate a at least one attachment token representative of the attachment; 
determine a spam probability value from the gonoratod at least one attachment token; 

and 

determine whether the at least one attachment token is present in a database of tokens 
and, in response to a determination that the at least one attachment token is present in the 
database of tokens: 

update the spam probability value of the at least one attachment token: 

sort the gonoratod tokens at least one attachment token in accordance with the 
corrospond i ng dotorm i nod spam probability value to determine a predefined number of 
interesting tokens, the predefined number of interesting tokens being a subset of the generated 
tokens, wherein only displaying characters are tokon i zod. tokenized: and 

filter a second email message. 

33. (Currently Amended) The computer-readable storage medium of claim 32, the 
program further causing the computer to perform at least the following: 

receive an the first email message having a text body. 

34. (Currently Amended) The computer-readable storage medium of claim 33, the 
program further causing the computer to perform at least the following: 

tokenize the words in the text body to generate body tokens representative of the words 
in the text body. 
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35. (Currently Amended) The computer-readable storage medium of claim 34, 
assign a body spam probability value to each of the body tokens representative of the 

words in the text body; 

assign a an attachment spam probability value to the token representative of the 
attachment; and 

generate a Bayesian probability value using the spam probab ili ty va l ues the attachment 
spam probability and the body spam probability assigned to the tokens, the body tokens and the 
attachment token. 

36. (Currently Amended) The computer-readable storage medium of claim 35, the 
program further causing the computer to perform at least the following: 

compare the gonoratod Bayesian probability value with a predefined threshold value. 

37. (Currently Amended) The computer-readable storage medium of claim 36, the 
program further causing the computer to perform at least the following: 

categorize the first email message as spam in response to the Bayesian probability 
value being greater than the predefined threshold. 

38. (Currently Amended) The computer-readable storage medium of claim 36, the 
program further causing the computer to perform at least the following: 

categorize the first email message as non-spam in response to the Bayesian probability 
value being not greater than the predefined threshold. 

39. (Currently Amended) The method of claim 1 , wherein the first email message is 
received at a computing device. 
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40. (New) The method of claim 1 , further comprising, in response to a determination 
that the space is not adjacent to a solitary T or 'a,' deleting the non-alphabetic character. 



